Skip to main content

Summary statistics - summarize and boxplot

The script below demonstrates different ways to use the summarize and boxplot commands to create descriptive statistics. These are applied to numerical variables to create summary statistics such as mean, standard deviation, number of units with a valid value, etc.

 // Summary statistics for metrical/continuous variables
// The summarize command are used to show summary statistics for metrical/continuous variables. Values shown are mean, standard deviation, quartile values etc. The boxplot command generate the same values in a graphical way through a standard boxplot presentation

require no.ssb.fdb:23 as db

create-dataset demography
import db/INNTEKT_WYRKINNT 2020-01-01 as income
import db/INNTEKT_BRUTTOFORM 2020-01-01 as wealth
import db/BEFOLKNING_KJOENN as gender
import db/BEFOLKNING_FOEDSELS_AAR_MND as birthdate
import db/BOSATTEFDT_BOSTED 2020-01-01 as municipality

// Recode from municipality to county level 
generate county = substr(municipality,1,2)

// Generate age per 2020
generate age = 2020 - int(birthdate/100)

summarize income wealth
summarize wealth if age > 50
summarize wealth if municipality == '0301' 

boxplot income wealth
boxplot income, over(gender)